Capstone Project - The Battle of Neighborhoods in Toronto

1. Introduction :

Opening a business can be exciting, but it should be determined carefully to flourish. There are some categories that needs to be taken account of. Most but not the least, you should think carefully of the followings:

  1. Where are you located to open a business?
  2. Which type of business do you want to open?

The major purpose of this project, is to propose which type of business would be beneficial to open for a client in a specific area, Toronto. Also, we will locate the similar business in corresponding neighborhoods to determine whether people would stop by near this location.


[entire code can be found in this link: https://github.com/jihea-katie-lee/Coursera_Capstone/blob/master/Code_The_Battle_of_Neighborhoods.ipynb]

2. Data Section :

To find a best a best type of business and its location, we will use the following resources of information:

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

From the link, we could obtain the postal code, borough, and neighborhood. Using 'geocoder' package, we obtained the latitude and longitude of each neighborhood. Since we will only consider 'Toronto' region, other borough has been ignored.

Screen%20Shot%202020-06-12%20at%205.13.07%20PM.png

3. Methodology :

In this project, we would like to find out which type of restaurants are populated in Toronto. Therefore, we would only focus on the venue type of restaurant.

To do so, we will apply 'K-means clustering' where it can group the data using unsupervised based algorithm. It will allow us to find the similar neighbhorhood with its venue type into one segment.

In order to segment the neighborhoods and explore them, we will need the data of the neighborhoods that exist in near Toronto as well as their latitude and logitude coordinates. We will utilize the Foursquare API where it has a database of millions of places to explore the neighborhoods and segment each category into desired search. From the database, we can obtain following information:

1. Neighborhood
2. Neighborhood's Latitude & Longitude
3. Venue
4. Venue's Latitude & Longitude
5. Venue Category

Screen%20Shot%202020-06-12%20at%203.41.03%20PM.png

image.png

Screen%20Shot%202020-06-12%20at%203.48.02%20PM.png

(Cluster 0, 1, 2, 3 = Purple, Blue, Yellow, Red ; respectively)

After analyzing the data, we could see that there is venue called restaurant where it didn't specify the category of the food. So, we will exclude those data to easily evaluate the categories. Also, some of the clustered data only represent one Borough, which doesn't have specific characteristic of its own. We will also reduce the number of k-cluster value to evaluate further.

4. Results :

Re-evalute each neighborhood along with the top 5 most common venues

Screen%20Shot%202020-06-12%20at%203.56.13%20PM.png image.png

(Cluster 0, 1, 2 = Purple, Green, Red ; respectively)

Screen%20Shot%202020-06-12%20at%204.12.34%20PM.png

Screen%20Shot%202020-06-15%20at%207.10.42%20PM.png

Screen%20Shot%202020-06-15%20at%207.15.38%20PM.png

5. Discussion :

From the above plots, it has been demonstrated that most of the Borough has Japanese Restaurant and Italian restaurant. Among all of the '1st Most Common Venue', the Japanese restaurant were largely placed, then Italian restaurant. As the main purpose of this project was to suggest type of restaurant and its location in Toronto. Using these information, the client should decide which restaurant s/he wants to open.

  • As having many Japanese and Italian restaurant around the area, it is highly likely to be competetive to open a Japanese or Italian restaurant. Therefore, it is better to consider other type of restaurant.
  • Additionally, we can see that cluster 0 has more restaurants around, which gives more opportunity that people would like to visit. So, we can suggest to client to open a restaurant near that area which mostly located in downtown Toronto.
  • We can suggest a client to open a restaurant near downtown Toronto with the less but not the least common venue using the graph above to meet the desire/preferences with the user of the restaurant.

6. Conclusion :

In this project, the 103 different postal code of the Canada was used to find the corresponding latitude and logitude from dataset. Among them, borough of 'Toronto' was evaluated to find the appropriate type of restaurant to make profit with the request of the client. Using the foursquare API, the top 5 of the most popular venue of the restaurant were evaluated via k-means cluster algorithm. Now, the client could make decisions on which type of restaurants to open in Toronto based on the information we have obtained using data analysis in this project.